Two credit scoring models based on dual strategy ensemble trees

نویسندگان

  • Gang Wang
  • Jian Ma
  • Lihua Huang
  • Kaiquan Xu
چکیده

Decision tree (DT) is one of the most popular classification algorithms in data mining and machine learning. However, the performance of DT based credit scoring model is often relatively poorer than other techniques. This is mainly due to two reasons: DT is easily affected by (1) the noise data and (2) the redundant attributes of data under the circumstance of credit scoring. In this study, we propose two dual strategy ensemble trees: RS-Bagging DT and Bagging-RS DT, which are based on two ensemble strategies: bagging and random subspace, to reduce the influences of the noise data and the redundant attributes of data and to get the relatively higher classification accuracy. Two real world credit datasets are selected to demonstrate the effectiveness and feasibility of proposed methods. Experimental results reveal that single DT gets the lowest average accuracy among five single classifiers, i.e., Logistic Regression Analysis (LRA), Linear Discriminant Analysis (LDA), Multi-layer Perceptron (MLP) and Radial Basis Function Network (RBFN). Moreover, RS-Bagging DT and Bagging-RS DT get the better results than five single classifiers and four popular ensemble classifiers, i.e., Bagging DT, Random Subspace DT, Random Forest and Rotation Forest. The results show that RS-Bagging DT and Bagging-RS DT can be used as alternative techniques for credit scoring. 2011 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank

Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...

متن کامل

Improving experimental studies about ensembles of classifiers for bankruptcy prediction and credit scoring

Previous studies about ensembles of classifiers for bankruptcy prediction and credit scoring have been presented. In these studies, different ensemble schemes for complex classifiers were applied, and the best results were obtained using the Random Subspace method. The Bagging scheme was one of the ensemble methods used in the comparison. However, it was not correctly used. It is very important...

متن کامل

Credit scoring with boosted decision trees

The enormous growth experienced by the credit industry has led researchers to develop sophisticated credit scoring models that help lenders decide whether to grant or reject credit to applicants. This paper proposes a credit scoring model based on boosted decision trees, a powerful learning technique that aggregates several decision trees to form a classifier given by a weighted majority vote o...

متن کامل

Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble

Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal...

متن کامل

Improving the management of microfinance institutions by using credit scoring models based on Statistical Learning techniques

A wide range of supervised classification algorithms have been successfully applied for credit scoring in non-microfinance environments according to recent literature. However, credit scoring in the microfinance industry is a relatively recent application, and current research is based, to the best of our knowledge, on classical statistical methods. This lack is surprising since the implementat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Knowl.-Based Syst.

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2012